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Abstract: We present the Bayesian method for evaluating the evidence for a non-zero 
value of the leptonic mixing angle #13 and CP-violation in neutrino oscillation experiments. 
This is an application of the well-established method of Bayesian model selection, of which 
we give a concise and pedagogical overview. When comparing the hypothesis #13 = with 
hypotheses where #13 > using global data but excluding the recent reactor measurements, 
we obtain only a weak preference for a non-zero #13, even though the significance is over 3<r. 
We then add the reactor measurements one by one and show how the evidence for #13 > 
quickly increases. When including the Double Chooz, Daya Bay, and RENO data, 
the evidence becomes overwhelming with a posterior probability of the hypothesis #13 = 
below 10 -11 . Owing to the small amount of information on the CP-phase 5, very similar 
evidences are obtained for the CP-conserving and CP-violating hypotheses. Hence, there 
is, not unexpectedly, neither evidence for nor against leptonic CP-violation. However, 
when future experiments aiming to search for CP-violation have started taking data, this 
question will be of great importance and the method described here can be used as an 
important complement to standard analyses. 
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1 Introduction 

The phenomenon of neutrino oscillations has been established in a series of experiments 
using neutrinos from a wide range of sources, such as the Earth's atmosphere, the Sun, 
nuclear reactors, and man-made accelerators [1-3]. Neutrinos can only oscillate if they 
are massive, and there are many important questions raised related to neutrino masses, 
which are currently being investigated. These include the Dirac or Majorana nature of 
neutrino masses, and the question of whether total lepton-number is violated, information 
on which can be obtained through experiments searching for neutrinoless double beta decay 
[4-6]. In addition, there is the quest for the determination of the absolute values of the 
neutrino masses, which is most directly performed using single beta decay experiments 
[7, 8], although cosmological observations can also provide information [9]. 

However, there are still many questions related to oscillation experiments which remain 
to be answered. The oscillations observed until recently correspond to the dominant effec- 
tive two- flavor oscillation modes, driven by two mass-squared differences and two relatively 
large mixing angles, while the purpose of most current and future experiments is mainly to 
search for sub-leading effects. This includes the determination of the third leptonic mixing 
angle #13 and of the neutrino mass ordering, which can either be normal or inverted. Also, 
the existence of CP-violation in the lepton sector is a very important question. 

In fact, CP-violation in neutrino oscillations, which is a genuine three-flavour effect, is 
only possible for a non-zero value of #13, and any realistic possibility to determine the type 
of the neutrino mass ordering relies on #13 not being too small [10]. Therefore, the results 
on #13 from the present generation of experiments will be of crucial importance for the 
feasibility of future experiments aiming to determine the neutrino mass ordering or search 
for leptonic CP-violation. 
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For a long time, the neutrino oscillation data was largely consistent with a zero value 
of #13. However, a while ago, there was a series of experiments indicating a preference 
for a non-zero value of 13 from the MINOS [11], T2K [12], and Double Chooz [13] 
collaborations. These measurements were recently followed by much more precise ones 
from Daya Bay [14] and RENO [15]. 

Due to the degeneracies between oscillation parameters in individual neutrino oscilla- 
tion experiments, there exists a long history of global fits of oscillation data, with some of 
the most recent ones being Refs. [1-3]. These analyses apply standard frequentist meth- 
ods in order to estimate the oscillation parameters and to determine the significance of 
013 > (see also Refs. [16, 17]), but do not include any of the recent reactor data of the 
Double Chooz, Daya Bay, and RENO collaborations. Bayesian estimation of neutrino 
oscillation parameters using subsets of neutrino oscillation data has been performed in 
Refs. [18, 19]. Finally, after the completion of this work, an update of Refs. [1, 2] was made 
public, including the recent reactor data [20]. 

In this work, we describe how to correctly determine the evidence for a non-zero 
value of the leptonic mixing angle 613 and a non-trivial value of the CP-violating Dirac 
phase 5 using Bayesian inference. The well-established method of analyzing these types 
of questions is based on Bayesian model selection, which we will describe in some detail. 
More information on model selection can be found in the textbooks and review articles in 
Refs. [21-25]. Applications within the fields of cosmology and astrophysics can be found 
in Refs. [21, 23, 24, 26-28], while applications in particle physics are given in Refs. [29- 
33]. We will compare the hypothesis $13 = with the hypotheses where #13 can take 
on any value within its physical range, and in this way obtain the preference of a non- 
zero value of 013. After assigning prior probabilities on the different hypotheses, we obtain 
posterior probabilities of the hypotheses that 0i3 > and 613 = 0, respectively. We evaluate 
how recent data has increased step by step the evidence so that it now overwhelmingly 
prefers a non-zero 613 (which is not very surprising, given the recent reactor measurements). 
Furthermore, we automatically obtain the method of evaluating the Bayesian evidence for 
a non-trivial value of the phase 5. Since current data is not very sensitive to the value of 
5, we find, not surprisingly, no evidence neither for nor against CP-violation in neutrino 
oscillations. However, in the future when proposed experiments aiming to search for CP- 
violation (see, for example, Refs. [34-37]) have started giving data, this question will be 
of great importance, and the method described here can be used to assess to which degree 
that data favors or disfavors CP-violation in neutrino oscillations. 

This work is organized as follows. In Sec. 2, we review the principles of Bayesian 
inference, including a self-contained treatment of Bayesian model selection. Sec. 3 describes 
our method of performing model selection for determining the evidence favoring a non-zero 
013 and leptonic CP-violation, and includes model definitions, a thorough discussion on the 
choice of priors, and the approximation of the likelihood we need to make. The numerical 
results are presented, first using the global data analyzed in Ref. [2], and then also including 
the recent Double Chooz, Daya Bay, and RENO data. Finally, a brief summary and 
our conclusions are given in Sec. 4. 
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2 Bayesian inference 



In the Bayesian interpretation, probability is associated with degree of belief. This is in 
contrast to the frequentist interpretation, in which probability is defined as the limit of the 
relative frequency of an event in a large number of repeated trials. 

Bayesian inference is a framework for updating prior belief or knowledge based on 
new information or data. A common problem in data analysis is to use the data to make 
inferences about parameters of a given model, and at a higher level, to decide which 
of two or more competing models is preferred by the data. Bayesian inference answers 
the question how probable a given value of a parameter, or a whole model, is, given the 
observed data. If one is interested in probabilities of parameters within models, as well 
as the models themselves, one is forced to adopt the Bayesian approach. Generally, the 
probability Pr(A\B) represents the degree of belief regarding the truth of A, given B. The 
order of the conditioning can be reversed using Bayes' theorem, 

x Pt(B\A)Pt(A) , . 

Pr(A|5) = Pr (V ( ] 

Often, one is interested to infer values of parameters of a model from a set of obser- 
vations or data. Given a model or hypothesis H with a set of N parameters = {Qi}^ =1 , 
and a set of data D = {-Dj}j =1 , Bayes' theorem implies 1 

, n . x Pr(D|0, H)Pt(&\H) , . 

Pr(@|D, g H >- , (2.2) 

where Pr(0|D, H) is the posterior probability (density) of the parameters 0, given the 
model and the data, and 7r(0) = Pi(@\H) is the prior probability (density). The likeli- 
hood function C(&) = Pr(D|0,ff) is the probability (density) of the data D, assuming 
parameter values 0, while Pr(D|if) is the Bayesian evidence (or model likelihood), which 
is given by 

Z = Pr(D|#) = J Pr(D,Q\H)d N Q = J Pr(D|0, H) Pr(0| J ff)d JV 

= J £(0)vr(0)d 7V 0, (2.3) 

and is simply the factor required to normalize the posterior in Eq. (2.2). Since the evidence 
does not depend on the values of the parameters 0, it is usually ignored in parameter 
estimation problems and the parameter values and uncertainties are obtained using the 
unnormalized posterior. However, the evidence plays a central role in model selection, as 
will be described in Sec. 2.1. 

One important thing to keep in mind is that prior and posterior probability densities, 
in order to keep the total probability invariant, transform under a change of variables 
— > ft = f2(0) by multiplication by the Jacobian determinant, i.e., as 



Pr(fi) = Pr(0) 



(2.4) 



1 All probabilities are also implicitly assumed to be conditioned on all the relevant background information 
I, i.e., Pr(X) is written instead of Pr(X\I). 
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Hence, a prior uniform in one parameter will not be so in a nonlinear function of it, and 
thus the specification of a prior is essentially equivalent to the specification of a variable 
in which the prior is uniform. 

The probability density of any subset rj of the parameters = (rj, p) is obtained by 
integrating over the other parameters p, fully taking into account their uncertainty, as 

PT(rj\X) = Jpr(rj,p\X)dp, (2.5) 

for any X. This makes it possible to eliminate nuisance parameters in a fully consistent 
way by including them in the parameter space and then performing the above integral over 
the posterior distribution. In addition, the probability density of any (unique) function of 
the parameters K = F(&) is obtained as 

Fr(K\X) = J Pr(K\@,X)Pii@\X)d N G = J 5{K - F(0)) Vi(®\X)d N ®. (2.6) 

Although this might look like a daunting integral, if one has access to samples from 
Pr(0|X), one can easily find the total probability in an interval of K by simply binning 
the samples. 

The main result of Bayesian parameter inference is the posterior and its marginalized 
versions (usually in one or two dimensions). However, it is also common to give point 
estimates such as the posterior mean or median, as well as credible intervals (regions), which 
are defined as intervals (regions) containing a certain amount of posterior probability. Note 
that these regions are not unique without further restrictions, just as for classical confidence 
intervals. 

Examples of Bayesian analyses of particle physics models are global fits of supersym- 
metric extensions of the standard model [29, 30, 38-41] and of the minimal universal extra 
dimensions scenario [42]. 

2.1 Model selection 

The discussion in the previous section was concerned with the Bayesian method of inferring 
the values of parameters, assuming a certain hypothesis H to be true. However, another 
arguably much more important question is that of which hypothesis is the best description 
of the data in the first place. It is the evidence in Eq. (2.3) which can be used to distinguish 
between a set of hypotheses {Hi}^ =1 . This is because Bayes' theorem also implies 

PrW|D) = ( ,T, 



and so gives the posterior ratio of probabilities of two hypotheses as 
Pr(^|D) Pr(D|ffi) Pr(^) Z { Pr(^) 



Pr(flj|D) Pr(D|flj) Pr(flj-) ZjYx(E 



(2- 



3> 



where Pr(iJj)/Pr(.H,) is the prior probability ratio of the two models. The ratio of evi- 
dences, Bij = Zi/Zj is often called the Bayes factor. From Eq. (2.3), one observes that 
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log(Zi/2 ) 


Zi/Zq 


Pr(iJi|D) 


Interpretation 


< 1.0 


<3:1 


<0.75 


Inconclusive 


1.0 


-3:1 


~ 0.75 


Weak evidence 


2.5 


~ 12 : 1 


~ 0.92 


Moderate evidence 


5.0 


~ 150 : 1 


~ 0.993 


Strong evidence 



Table 1. Jeffrey's scale often used for the interpretation of Bayes factors and model probabilities. 
The posterior model probabilities are calculated by assuming only two competing hypotheses and 
equal prior probabilities. 

the evidence is the average of the likelihood over the prior, and hence this method auto- 
matically implements a form of Occam's razor, since in general a simpler theory with a 
smaller parameter space will have a larger evidence than a more complicated one, unless 
the latter can fit the data substantially better. More specifically, only the inclusion of 
parameters which are constrained by the data will lead to a smaller evidence, while the 
inclusion of parameters that are unconstrained will leave the evidence unaffected. Since 
Pr(D) = P r (D|-ffi) Pr(flj), Eq. (2.7) can easily be written in terms of the evidences 
and the prior probabilities as 

Pr(^-|D)= 1 (2.9) 

In the simplest case when two models being compared have no free parameters, the 
Bayes factor is simply the likelihood ratio. When the models have free parameters, it is 
still a likelihood ratio, but between the whole models, and this ratio is obtained by inte- 
grating over the parameter spaces of the models as in Eq. (2.3). Note that Bayesian model 
selection allows data to favor the simpler model. Also, it is possible to incorporate external 
information when comparing models. In addition, note that the posterior distributions of 
the parameters do not depend on the overall scale of the likelihood, since the evidence 
scales accordingly. The same holds true for the posterior model probabilities, since the 
ratio of evidences in Eq. (2.9) is also independent of the likelihood normalization. The 
significance of the evidence is usually interpreted using Jeffrey's scale in Tab. 1, as used 
in, for example, Refs. [21, 23, 24, 29, 30]. 

Generally, as more data become available, for any (not completely unreasonable) prior 
distribution, the posterior becomes practically independent of the prior, and determined 
solely by the likelihood. However, the dependence of the evidence on the prior always 
remains, although the Bayes factor will generally favor the correct model once "enough" 
data has been obtained [23] . 

It can be interesting to compare this method with the completely different approach 
used in frequentist inference in the special case of nested models. In the latter case, the 
significance of a new effect is usually evaluated using hypothesis tests [43] . The null hypoth- 
esis Hq of no effect is expressed as rj = rja and is tested against the alternative hypothesis 
r] ^ t]q. A significance level a is chosen and a test statistic with known probability distri- 
bution under Hq is constructed. Using the observed data, an "observed" value of the test 



- 5 - 



statistic and a p-value p is calculated. The null hypothesis is rejected if p < a? Note that 
the p-value is not directly related to the probability of the null hypothesis being true. 
A very commonly used statistic is based on the profile likelihood ratio. Define 

q»(D, ») = -2 log SUP '' ^ • "\ = -2 log (2.10) 

where "sup" denotes the supremum, a single hat denotes the parameters which maximize 
the likelihood, and a double hat indicates the conditional maximum for fixed rjo- Since 
in this case the dependence on the data D is important, it is explicitly shown as an 
argument. Under the assumption that Hq is true, in the large sample limit and under some 
additional conditions, Q 2 (often denoted by A% 2 ) has a ^-distribution with number of 
degrees of freedom equal to the dimensionality of rj. This result is known as Wilks' theorem 
[44]. Frequentist confidence intervals at confidence level 1 — a can then be constructed 
by performing the hypothesis test for all values of the parameter and then including all 
parameter values that are not rejected at a significance a. This close relationship between 
between interval construction and hypothesis testing does not, however, have any analogy 
in Bayesian inference, i.e., there is no direct relation between parameter estimation and 
credible interval construction on the one hand and model selection on the other hand. 
Note that a more complicated model will always be able to fit the data at least as good 
as the simpler model, and hence, unlike in Bayesian model selection, one can never obtain 
evidence in favor of the null hypothesis. 

Sometimes it may happen that a conclusion based on a hypothesis test seems to con- 
tradict that obtained using model selection. This can, for example, happen if the estimate 
of 7] is found far from the value of the null hypothesis and a large value of Q 2 is observed, 
and so the data is unlikely under Hq. However, the data could be even more unlikely under 
the alternative as in Eq. (2.3), and hence the Bayes factor can even favor Hq. Also, the 
prior probability of the alternative could be very low. Thus, taking the Bayesian view also 
means that the significance, or the "number of cr" , of a result is in general not a good indi- 
cator of the importance or the evidence of a new effect, a result that is known as "Lindley's 
paradox". For further details, see for example Appendix A of Ref. [23]. As we will see, 
this situation does, to some extent, apply to the case of the third leptonic mixing angle #13 
when the recent reactor measurements are not taken into account. 

We want to emphasize that the posterior probability density at the special value 77 = 770 
of the alternative model says nothing about the probability that 77 = 770, and neither does 
the relative posterior densities at rj = i]q and the maximum of the posterior. The reason 
is that, if one uses a continuous prior probability density for rj, then the probability that 
77 = ijq is zero for any value of 770- However, for nested models and under some additional 
assumptions, the ratio of the prior to the posterior at 77 = 770 does indicate this probability, 
since it is in fact the Bayes factor [23]. 

Finally, we mention that the Bayesian analysis can be taken even one step further by 
performing model averaging. Using this, the posterior for any set of parameters can be 



2 The p-value is not the significance of the test, but it is the highest significance at which Ho could be 
rejected. This distinction will be implicitly assumed for the rest of this work. 
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calculated, while taking into account the uncertainty regarding which model is the correct 
one. Once again, all one has to do is to use the laws of probability theory to obtain [21, 24] 



r 

Pr(0[X) =J2^(®\Hi,X)PT(Hi\X), (2.11) 

i=l 

i.e., the probability distribution is given by the average of the individual distributions over 
the space of models, with weights equal to the model probabilities. 



3 Bayesian model selection for 6> 13 and 5 
3.1 Model definitions 

As the general framework, we simply take the Standard Model and augment it with a neu- 
trino mass matrix. The nature of the mass matrix (Dirac or Majorana) is not important 
here, but we observe that the weakly interacting neutrino fields v a (a = e, /j, r) are super- 
positions of mass eigenstate fields Vi (i = 1, 2, 3) with masses m^. The neutrino masses mj 
can either have normal (mi < 777-2 < ^3) or inverted (7773 < mi < m.2) ordering. In a basis 
where the charged lepton mass matrix is diagonal we have 



(3.1) 



where U is the leptonic mixing matrix, also referred to as the Pontecorvo-Maki-Nakagawa- 
Sakata (PMNS) matrix, usually parametrized as 
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(3.2) 



where c,- 



cos 6ij and = sin^-, ^12,^23, and ^13 are the lepton mixing angles, 5 is the 
CP-violating Dirac phase, and a and p are CP-violating Majorana phases, which are only 
relevant in the case of Majorana neutrinos. The physical ranges for the mixing angles are 
the intervals [0, 7r/2] [45], while 5 can be restricted to [— 7r,7r], and the Majorana phases to 



[0,tt]. With Am|i 



777n 



m 



\ and Am| 



31 



777o 



m 



neutrino oscillation experiments are 



sensitive to the set of parameters © sc = (Am|i, Am§i, #12, #23) #13 , 5), meaning that these 
are the parameters that the oscillation probabilities depend on. Defining 



fAm^i, Am 2 



31) fl2, 1^23 



) 



(3.3) 
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we distinguish the two CP-conserving cases 5 = and 5 = tt, and define the following 
hypotheses 



#3 



00 = tp, 0i3 = (5 irrelevant), 

0! = (ip,0i 3 ), 0i 3 G [0,vr/2], 5 = 0, 

©2 = (V, ^is), % S [0, vr/2], 5 = tt ( or - tt), 

©3 = W,ei3,5), 0i3 G [0,vr/2], 5 G [-tt.tt]. 



(3.4) 
(3.5) 
(3.6) 
(3.7) 



These hypotheses can then be compared by computing the evidences in Eq. (2.3). For this 
one requires the likelihood function and priors on the oscillation parameters. These will be 
described in Sees. 3.3 and 3.2, respectively. 

The neutrino mass ordering will be treated as fixed, and the very small differences ob- 
tained by assuming the different orderings to be true will be evaluated in Sec. 3.4. However, 
once more data from future experiments searching for CP-violation become available, the 
constraints on the phase 5 is expected to depend very much on the assumed mass ordering 
[46]. In this case, it might be preferable to define separate models for each mass ordering, 
and perform model selection on all eight models simultaneously. 

3.2 Choice of priors 

In any Bayesian analysis, one needs to specify prior probability distributions on the param- 
eters in a given model, and if more than one model is considered, probabilities on the space 
of models. In general, the prior should reflect ones prior knowledge, given the relevant 
background information. However, some difficulties can appear if there is very little back- 
ground information, or if it is difficult to translate this information into mathematically 
precise statements. 

First, we consider the question of how to assign prior model probabilities. A reasonable 
choice could be to use prior probabilities Pt(Hq) = 1/3, Pr(Hi) = Pr(i?2) = 1/6, and 
Pr(i?3) = 1/3. Then, however, the a priori probability of $13 = would only be 1/3 
rather than 1/2, which could be considered more natural. This will, as always, only have a 
non-negligible effect on the posterior probabilities (as it should) when the data is not very 
informative. Even in this case the effect will be small, and actually much smaller than the 
effect coming from the variations in the evidences when different priors of #13 are chosen. 

Then, there is the task of assigning prior probability densities on the continuous pa- 
rameters of the models. To deal with this, a wide variety of methods and rules have been 
developed which one can use to obtain priors in cases when there is little prior information 
(see, for example, Refs. [47, 48]). These methods can serve as a guide to what shapes a 
reasonable distribution might have. However, we take the common point of view that, in 
general, these priors should not be accepted too blindly. In the end, the priors should be 
proper, i.e., normalized to unity, and be a reasonable reflection of one's degree of belief. 
By considering the variation of the posterior inference when a set of different such priors 
is considered, one can check the robustness of the obtained results. 

We assume a prior on the form 
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on the oscillation parameters. Besides being quite natural, this form of the prior can be 
obtained by demanding that the prior must not depend on which basis the neutrino mass 
matrix is defined in. The resulting so-called Haar measures on the leptonic mixing matrix 
have been discussed in Refs. [49, 50], 3 which lead to unique and separable priors on the 
mixing angles and phases. The prior on ip is taken to be the same for all hypotheses. 

Our form of the evidences, derived in Sec. 3.3, will be independent of the prior on ip, and 
hence from now on we only consider the priors on #13 and 5. In principle, one could simply 
make up a list of different reasonable priors, or equivalently variables in which one chooses 
a uniform prior, to investigate. The different choices of priors on #13 considered in this work 
are those derived from the Haar measures on SO(2), U(2), SO(3), and U(3), respectively. 4 
Also, since the experiments most sensitive to #13, i.e., those involving accelerator and 
reactor neutrinos, are mainly sensitive to sin 2 (26*13), one can choose a prior uniform in 
sin 2 (2#i3), in which case Eq. (2.4) is used to extend the prior to the whole physical range 
#13 £ [0, 7r/2]. The different priors can be summarized as 

A: tt(9 13 ) = 2/tt 

B: tt(s 2 13 ) = 1 

C : vr(si 3 ) = 1 

V: vr( C 4 3 ) = l 

£ : 7T (sin 2 (2#i 3 )) = 1, 

where #13 € [0,7r/2] and all other variables are in the interval [0,1]. The corresponding 
implied priors on #13 according to Eq. (2.4) are plotted in Fig. 1, where also the approximate 
region in which most of the contribution to the evidences originates from is marked with a 
grey band. It is important to note that we do not consider any of the studied priors to be 
"better" than the others, but instead consider them all as reasonable, and then evaluate 
how much the posterior inferences depend on the choice of priors. 

Note that, in the rough approximation of a box-shaped likelihood and a constant 
prior within this region, the evidence is proportional to that constant value of the prior. 
Hence, one can estimate how the evidence for models with non-zero #13 will depend on the 
prior chosen. Most importantly, the prior C will give the largest evidence, approximately a 
factor of two to four larger than B, which yields the smallest evidence. This will be checked 
numerically in Sec. 3.4. 

For the model H3, one also needs to specify a prior on 5. However, since the constraints 
on 8 are so weak, not only will its inclusion in the first place effect the evidence very little, 
but also the form of its prior will be largely irrelevant. The Haar measure is uniform in 5, 
and so we use ir(5) = l/(2ir). If, in the future, the data becomes more informative regarding 
the value of 5, one would need to evaluate if that data indeed supports the existence of 
CP- violation in neutrino oscillations, and the general method presented in this work can 
be followed. In this case, one could also evaluate the sensitivity to the choice of the prior 

3 However, note the arising complications due to the unphysical phases discussed in Ref. [50]. 
4 That is, corresponding to real two-flavor, complex two-flavor, real three-flavor, and complex three-flavor 
mixing, respectively. 
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Figure 1. The different correctly normalized priors on 6*13. The grey band marks the approximate 
region from which the main contribution to the evidence originates. 

on 6. Since it is sin<5 and cos<5 which appears in the leptonic mixing matrix in Eq. (3.2), 
and hence also enter into the oscillation probabilities and CP-asymmetries, suitable choices 
could be priors uniform in sin 6 and cos 6. 

3.3 Approximate Bayes factors 

Ideally, one would like to use the full likelihood of all the oscillation parameters, as well 
as the parameterization of the systematic uncertainties. However, since we do not have 
the machinery to do this, we are forced to make some simplifying assumptions. For any 
individual oscillation experiment, the likelihood as a function of all the oscillation parame- 
ters is in general highly degenerate and/or multi-modal. Nevertheless, thanks to the large 
number of different types of experiments, imposing constraints on different combinations 
of oscillation parameters, the likelihood has become unimodal and, at least to some ap- 
proximation, Gaussian. The correlations between the different oscillation parameters in 
the standard parameterization are small, and the best-fit values of the different oscillation 
parameters are largely independent of the values of the other parameters [1-3]. We thus 
neglect any dependence between ip as a whole and the pair (613,6) and approximate the 
likelihood as 



With this approximation, the dependence of ip factorizes out, and the two-dimensional 
marginalized posterior is simply 



(tP)£^ v (9 13 ,6). 



(3.10) 





Furthermore, from Eq. (2.10), one easily obtains 




(3.12) 
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giving 



since the conditional maximum with respect to tp is independent of #13 and 5 in this case. 
The quantity £ CPY (6 13 , 5) is an irrelevant factor which can be set to unity. The likelihood 
£ CPV without recent reactor data (we denote this data set by GF) is evaluated using 
Eq. (3.13), and a map of Q 2 in the #13 — S plane obtained from the authors of Ref. [2]. 5 

Since the numbers of events in the reactor experiments Double Chooz (DC), Daya 
Bay (DB), and RENO are quite large [13-15], the likelihoods are taken as Gaussian 
functions in the mean number of events, or equivalently, in sin 2 (2#i3), 

sin 2 (2(9i 3 ) = 0.086 ±0.0508 (DC), (3.14) 
sin 2 (26li3) = 0.092 ±0.0177 (DB), (3.15) 
sin 2 (26li3) = 0.113 ±0.0230 (RENO), (3.16) 

where the systematic errors 6 have been integrated over using Gaussian distributions, giv- 
ing the resulting effective likelihoods as Gaussians with unchanged means and deviation 
parameters added in quadrature. 

From Eq. (2.3) and with the approximation in Eq. (3.10), one finds 



Z = J £(^,0,0)vr(V)d^ 

~ / £ N (^)7rWW-/: CPV (0,0), (3.17) 

£ N (</07r(V>)dV> • [ C CPY (9 l3 ,5Me 13 ,5)d9 13 d5, (3.18) 



(3.19) 

(3.20) 
(3.21) 



so that the ratio of the evidences is given by 

Z3 / £ CPV (gig, S)tt(6 13 ,6)d9 l3 d6 
Z ~ £ CPV (0,0) 

In a similar way, one obtains 

2k jc CPV (9 13 ,o)7r(e l3 )de 13 

Z - £CPV( 0)0 ) 
Z 2 _ f£ CPY (6 13 ,7r)ir(e 13 )d9 13 



Z ~ £CPV( 0)0 ) 

Although the approximation in Eq. (3.10) certainly introduces some error on the calculated 
evidences, it is expected to be much smaller than the uncertainty coming from the choice 
of prior distributions, which is often relatively large. 



J This corresponds to the "recommended" analysis of Ref. [2], which includes the short-baseline reactor 
data. 

6 We assume the systematic errors to be independent. Although this is not really correct, it should be a 
sufficiently good approximation for our purposes. 
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The required integrals can, due to the simple form of the likelihood £ CPV and the low 
dimensionality, be evaluated using standard integration routines. For more complicated 
evidence integrals, more specific algorithms such as MultiNest [51, 52], which we also use 
in this work, are required. 

3.4 Numerical results 

Our reconstructed two-dimensional posteriors using the data analyzed in Ref. [2], assuming 
H% to be true, and using prior S, are shown in Fig. 2 for normal (left) and inverted (right) 
mass orderings. The 68 % and 95 % equal-posterior credible regions are marked by the 
black contours. Using different priors on #13 results in very similar posteriors. In Fig. 3, the 
posteriors including all the recent reactor data are shown. Besides the obvious observation 
that now sin 2 (2^13) is much better constrained, one notices that the preferred values of 
5 have changed. Although the added reactor data is completely insensitive to the value 
of 8, the degeneracy between 5 and sin 2 (2^13) in the other data results in changes of the 
preferred values. The maximum likelihood estimates are now 5 ~ tt and 5 ~ for normal 
and inverted mass orderings, respectively. The one-dimensional marginalized posteriors 
are shown in Fig. 4. Our obtained maximum likelihood estimates and 68 % confidence 
intervals of sin 2 (2#i3) when using all available data are sin 2 (2#i3) = 0.090 ± 0.012 and 
sin 2 (2#i3) = 0.092±0.013 for normal and inverted mass ordering, respectively. 7 Evaluating 
the posterior means and 68 % credible intervals yields essentially the same values. 

Note that the peaks of these posteriors, which are obtained assuming that #13 is non- 
zero, are quite far from #13 = 0, but in order to quantify the evidence of a non-zero #13 one 
has to perform model selection. Due to the small amount of information regarding the value 
of 5, the evidences for the models H\,H2, and #3 are very similar, | log(i?2,3/-2i)| < 0.2 
without Daya Bay and RENO data, and | log (.2^,3 /-2i) I ^ 0.5 when including all data. 8 
The difference in the logarithms of the Bayes factors when assuming the different mass 
orderings to be true (around 0.2 in log evidence) can be used as an estimate of the error 
on these logarithms induced by assuming the wrong mass ordering. 

First, the analysis is done using only the data analyzed in Ref. [2]. After that, we 
calculate how the evidence for #13 > changes when the Double Chooz, Daya Bay, 
and RENO result are added, respectively. In Tab. 2, the evidences Z ~ Z\ ~ Z2 c± Z% 
compared to Zq are shown. The ranges given are those obtained using the different priors 
on 0i3, but also the smaller variations coming from assuming different mass orderings are 
included. 9 Furthermore, the posterior probability of Ho is shown, calculated assuming that 
Pt(Hq) = 1/2 and Pr(f/o) = 1/3 (in the parentheses), respectively. Note that, due to the 
similarity of the evidences Z\,Zi, and Z% of the other models, the posterior probability of 
Hq is independent of the prior probabilities of these models for fixed Pt(Hq). In the last 
column, the square root of the test statistic Q 2 is given for testing Hq against Hi (but with 

7 The more careful analysis of Ref. [20] do not use the short-baseline reactor data and finds somewhat 
larger preferred values of #13. 

8 This does not depend on the prior on 613. 

9 The numerical errors on all logarithms of Bayes factors in this work are about 0.05 or smaller and can 
hence safely be neglected. 
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Data used 


log(Z/Z ) 


Z/Zq 


Pr(#o|D) 




GF 


0.4 - 2.1 


1.6 - 7.9 


0.11(0.06) -0.39(0.24) 


3.1 


GF + DC 


1.7-3.1 


5-22 


0.04(0.02) - 0.15(0.08) 


3.5 


GF + DC + DB 


14.2 - 15.4 


(1.5 - 5) • 10 6 


< 7 • 10" 7 


6.1 


All 


25.3 - 26.5 


(1 - 3) • 10 11 


< io- 11 


7.8 


Only DB 


9.1 - 10.3 


(9 - 30) • 10 3 


< IO" 4 


5.2 



Table 2. Summary of Baycs factors and posterior probabilities of Ho for Pi-(Hq) = 1/2 (and 
Ph(Hq) = 1/3 within parenthesis). The ranges of Z ~ Z\ ~ Z% — Z3 given are those obtained 
using the different priors on #13, but also the smaller variation coming from the assumed mass 
ordering is included. The last column gives the square root of the observed Q 2 (only Hq against 
Hi), which would equal the "number of a" under the conditions of Wilks' theorem. 

very small differences if testing against the other models), which would equal the "number 
of a" under the conditions of Wilks' theorem. 

First, we only use the data included in the global fit of Ref. [2] (again denoted by 
GF). Although there is more than a 3<7 significance, there is in fact only weak evidence of 
#13 > 0, with logarithms of the Bayes factors lying in the range 0.4 — 2.1 for the different 
priors on #13. The posterior probability of Hq, for this range of evidences, is in the range 
0.11-0.39 if Pr(fio) = 1/2 and in the range 0.06-0.24 if Pr(JT ) = 1/3. One can then also 
include the variation of the posterior probability of Hq coming from the variation of Pr(Ho) 
between 1/3 and 1/2 to obtain the range 0.06 — 0.39 for the posterior of Hq, including all 
discussed uncertainties. As discussed in Sec. 3.2, the prior B yields the smallest evidence 
due to the fact that it puts very little prior probability into the high-likelihood region (see 
Fig. 1). 

When adding the Double Chooz measurements, the evidence for a non-zero #13 
increases. However, there is still no conclusive evidence. All logarithms of Bayes factors 
increase by about 1 or slightly more for the different priors, yielding the range 1.7 — 3.1 
for \og(Z / Zq) and posterior probabilities of Hq (when including all uncertainties as in the 
previous paragraph) in the range 0.02 — 0.15. The significance found is about 3.5a. 

Only the Daya Bay data by itself yields a significance of 5.2er, and also the Bayesian 
evidence strongly prefers a non-zero #13. The logarithms of Bayes factors are now all 
larger than 9, equivalent to posterior probabilities of Hq smaller than 10~ 4 . When the 
Daya Bay data is combined with the previous data, we find that \og(Z/ Zq) > 14, yielding 
posterior probabilities smaller than 7T0 -7 for all prior combinations. The total significance 
is about 6.1<7. 

Finally, when all the available data is included, we find that the significance is about 
7.8<7, and the evidences \og{Z / Zq) > 25, giving posterior probabilities of Hq smaller than 
10~ n for all combinations of priors. 

We observe that, for the first two cases in Tab. 2, the p-values corresponding to the 
observed values of Q 2 (which, again, are directly unrelated to the model probabilities) are 
rather small, while the posterior probabilities of Hq are still significantly non-zero. This is 
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0.05 , 0.1 0.15 0.05 , 0.1 0.15 



sin^(29 13 ) sin^2e i3 ) 

Figure 2. Two-dimensional posterior distributions for sin 2 (2^i3) and 8, using uniform priors on 
these variables and the data analyzed in Ref. [2]. Left (right) plots are for normal (inverted) 
neutrino mass ordering. The two black curves enclose 68 % and 95 % of the posterior probability, 
respectively 



0.5 



-0.5 



-1 




0.05 , 0.1 0.15 0.05 , 0.1 0.15 



sin'(28 13 ) sin*(26 13 ) 



Figure 3. Same as Fig. 2, but using all available data. 



the automatic Occam's razor effect of model selection at work: although the hypotheses 
with non-zero #13 can accommodate significantly better fits, this is expected since these 
hypotheses have more freedom in their predictions, and this lack of predictivity is punished 
when using model selection. Although the maximum likelihoods are very large, the average 
ones are not, and these are what matters in model selection. Hence, the simpler model 
with #13 = 0, being more predictive, is not strongly disfavored. The situations are different 
for the three cases in the last three rows of Tab. 2. There, the likelihoods are so much 
larger for the alternative models that they are very strongly preferred, even though they 
are still less predictive and hence punished by the Occam's razor effect. 

Since the posterior probability of Hq is so incredibly small when including all data 
and in practice completely independent of any prior assumptions, we conclude that #13 is 
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0.05 . 0.1 0.15 

sin 2 (26 13 ) 
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5/ji 

Figure 4. One-dimensional posterior distributions of sin 2 (26*13) and S for different combinations 
of mass orderings and data sets. 

non-zero with a probability practically equal to one. The next question is then if there is 
evidence of CP-violation in neutrino oscillations, i.e., a non-trivial value of the phase 6. 
Since the three models H\,H2, and H3 all have 6*13 as a free parameter, the resulting Bayes 
factors will to a very good approximation be independent of the prior on #13. Using all 
available data, we obtain the Bayes factors for normal mass ordering 

log(Z 2 /Z 1 ) ~ 0.5, log(Z 3 /Z 1 ) ~ 0.2, (3.22) 

while for the inverted ordering, the result is 

\og{Z 2 /Z l ) ~ -0.2, log(Z 3 /£i) ~ -0.3. (3.23) 

Hence, there is evidence neither for nor against CP-violation in neutrino oscillations. This 
was of course expected since the likelihood depends very weakly on 5. If one assigns prior 
probabilities as in Sec. 3.2, i.e., Pr(F ) = l/3,Pr(Hi) = Pr(F 2 ) = 1/6, and Pr(iJ 3 ) = 1/3, 
the posterior probability of H3 is very close to 0.5 for both mass orderings (| ~Pr(H 3 \D) — 
0.5| < 0.05), while the rest of the probability is shared between the H\ and H 2 in slightly 
different proportions for the different mass orderings. 

4 Summary and conclusions 

We have described how to determine the evidence of a non-zero value of the leptonic mixing 
angle 6*13 and a non-trivial CP-violating Dirac phase from neutrino oscillation data, using 
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Bayesian model selection. After having given a short, and hopefully clear, summary of 
the principles of Bayesian inference in general, and model selection in particular, we have 
applied it to the case of #13 and S. 

We have compared the hypothesis #13 = with hypotheses where #13 can take on 
any value within its physical range and, by calculating the Bayesian evidences, obtained 
the evidence favoring a non-zero #13. After assigning prior probabilities to the different 
hypotheses, we have calculated their posterior probabilities. We have shown how recent 
data has step by step increased the evidence disfavoring #13 = 0, so that there is now 
overwhelming evidence of #13 being non-zero. 

Although the strong preference for a non-zero #13 was expected, given the recent reactor 
data, this analysis also serves other purposes. Most importantly, we have described the 
Bayesian way of evaluating to what extent oscillation data supports the existence of CP- 
violation in neutrino oscillations through a non-trivial value of the phase 5. Since there is 
not much sensitivity to the value of 5 in current data, we find, not surprisingly, no evidence 
neither for nor against CP-violation. However, in the future, when proposed experiments 
aiming to measure CP-violation have started taking data, this question will be of great 
importance. The method described here can be used to assess to which degree that future 
data favors CP-violation in neutrino oscillations, and act as an important complement to 
standard x 2 -analyses. I n fact, it would be very interesting to evaluate how much proposed 
experiments would be able to provide (Bayesian) evidence of CP-violation (following, for 
example, Refs. [53-55]). 

Finally, we note that Bayesian model selection could also be applied to other problems 
related to neutrino oscillations. For example, it could be used to decide which mass ordering 
of the neutrinos is preferred. To do this one would essentially need the full likelihood of 
all the oscillation parameters and make sure that one uses the same normalization of the 
likelihood throughout. Since future constraints on the phase 5 is expected to depend very 
much on the assumed mass ordering, the best approach might be to define separate models 
for each assumption on 5 and for each mass ordering, and then to perform model selection 
on all models simultaneously. In addition, one could use model selection to investigate 
whether 623 is maximal or not, or if oscillation data requires the introduction of sterile 
neutrinos. 
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